Search CORE

366 research outputs found

Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale

Author: Börner Katy
Emmons Scott
Gallant Mike
Kobourov Stephen
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 08/07/2016
Field of study

Notions of community quality underlie network clustering. While studies surrounding network clustering are increasingly common, a precise understanding of the realtionship between different cluster quality metrics is unknown. In this paper, we examine the relationship between stand-alone cluster quality metrics and information recovery metrics through a rigorous analysis of four widely-used network clustering algorithms -- Louvain, Infomap, label propagation, and smart local moving. We consider the stand-alone quality metrics of modularity, conductance, and coverage, and we consider the information recovery metrics of adjusted Rand score, normalized mutual information, and a variant of normalized mutual information used in previous work. Our study includes both synthetic graphs and empirical data sets of sizes varying from 1,000 to 1,000,000 nodes. We find significant differences among the results of the different cluster quality metrics. For example, clustering algorithms can return a value of 0.4 out of 1 on modularity but score 0 out of 1 on information recovery. We find conductance, though imperfect, to be the stand-alone quality metric that best indicates performance on information recovery metrics. Our study shows that the variant of normalized mutual information used in previous work cannot be assumed to differ only slightly from traditional normalized mutual information. Smart local moving is the best performing algorithm in our study, but discrepancies between cluster evaluation metrics prevent us from declaring it absolutely superior. Louvain performed better than Infomap in nearly all the tests in our study, contradicting the results of previous work in which Infomap was superior to Louvain. We find that although label propagation performs poorly when clusters are less clearly defined, it scales efficiently and accurately to large graphs with well-defined clusters

arXiv.org e-Print Archive

Directory of Open Access Journals

PubMed Central

The University of Arizona

Post-processing partitions to identify domains of modularity optimization

Author: Emmons Scott
Gibson Ryan
Mucha Peter J.
Taylor Dane
Weir William H.
Publication venue: 'MDPI AG'
Publication date: 01/08/2017
Field of study

We introduce the Convex Hull of Admissible Modularity Partitions (CHAMP) algorithm to prune and prioritize different network community structures identified across multiple runs of possibly various computational heuristics. Given a set of partitions, CHAMP identifies the domain of modularity optimization for each partition ---i.e., the parameter-space domain where it has the largest modularity relative to the input set---discarding partitions with empty domains to obtain the subset of partitions that are "admissible" candidate community structures that remain potentially optimal over indicated parameter domains. Importantly, CHAMP can be used for multi-dimensional parameter spaces, such as those for multilayer networks where one includes a resolution parameter and interlayer coupling. Using the results from CHAMP, a user can more appropriately select robust community structures by observing the sizes of domains of optimization and the pairwise comparisons between partitions in the admissible subset. We demonstrate the utility of CHAMP with several example networks. In these examples, CHAMP focuses attention onto pruned subsets of admissible partitions that are 20-to-1785 times smaller than the sets of unique partitions obtained by community detection heuristics that were input into CHAMP.Comment: http://www.mdpi.com/1999-4893/10/3/9

arXiv.org e-Print Archive

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

Learning on Graphs: Supervised and Unsupervised Methods

Author: Emmons Scott
Publication venue: University of North Carolina at Chapel Hill
Publication date: 01/01/2019
Field of study

We study two methods for learning from network graph data. First, we present a novel method for the unsupervised learning problem of community detection. The proposed method is, to the best of our knowledge, the first enabling users to "zoom in" and "zoom out" on communities with varying levels of focus on network metadata. Second, we review Decagon, a system proposed by Zitnik et al. for the supervised learning task of link prediction. On a biomedical benchmark dataset, Decagon achieves state-of-the-art prediction accuracy. This work adds to the network scientist's machine learning toolkit, illustrating its power in a biomedical domain with significant public health impact.Bachelor of Scienc

Carolina Digital Repository

Sensory Regulation of C. elegans Male Mate-Searching Behavior

Author: Barrios Arantza
Emmons Scott W.
Nurrish Stephen
Publication venue: Elsevier Ltd.
Publication date: 09/12/2008
Field of study

SummaryHow do animals integrate internal drives and external environmental cues to coordinate behaviors? We address this question by studying mate-searching behavior in C. elegans. C. elegans males explore their environment in search of mates (hermaphrodites) and will leave food if mating partners are absent [1]. However, when mates and food coincide, male exploratory behavior is suppressed and males are retained on the food source [1]. We show that the drive to explore is stimulated by male-specific neurons in the tail, the ray neurons. Periodic contact with the hermaphrodite detected through ray neurons changes the male's behavior during periods of no contact and prevents the male from leaving the food source. The hermaphrodite signal is conveyed by male-specific interneurons that are postsynaptic to the rays and that send processes to the major integrative center in the head. This study identifies key parts of the neural circuit that regulates a sexual appetitive behavior in C. elegans

Elsevier - Publisher Connector

PubMed Central

Image Hijacking: Adversarial Images can Control Generative Models at Runtime

Author: Bailey Luke
Emmons Scott
Ong Euan
Russell Stuart
Publication venue
Publication date: 31/08/2023
Field of study

Are foundation models secure from malicious actors? In this work, we focus on the image input to a vision-language model (VLM). We discover image hijacks, adversarial images that control generative models at runtime. We introduce Behavior Matching, a general method for creating image hijacks, and we use it to explore three types of attacks. Specific string attacks generate arbitrary output of the adversary's choosing. Leak context attacks leak information from the context window into the output. Jailbreak attacks circumvent a model's safety training. We study these attacks against LLaVA-2, a state-of-the-art VLM based on CLIP and LLaMA-2, and find that all our attack types have above a 90\% success rate. Moreover, our attacks are automated and require only small image perturbations. These findings raise serious concerns about the security of foundation models. If image hijacks are as difficult to defend against as adversarial examples in CIFAR-10, then it might be many years before a solution is found -- if it even exists.Comment: Code is available at https://github.com/euanong/image-hijack

arXiv.org e-Print Archive

Annotation and analysis of a large cuticular protein family with the R&R Consensus in Anopheles gambiae

Author: Cornman R Scott
Dunn W Augustine
Emmons Aaron C
He Ningjia
Togawa Toru
Willis Judith H
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background The most abundant family of insect cuticular proteins, the CPR family, is recognized by the R&R Consensus, a domain of about 64 amino acids that binds to chitin and is present throughout arthropods. Several species have now been shown to have more than 100 CPR genes, inviting speculation as to the functional importance of this large number and diversity. Results We have identified 156 genes in <it>Anopheles gambiae </it>that code for putative cuticular proteins in this CPR family, over 1% of the total number of predicted genes in this species. Annotation was verified using several criteria including identification of TATA boxes, INRs, and DPEs plus support from proteomic and gene expression analyses. Two previously recognized CPR classes, RR-1 and RR-2, form separate, well-supported clades with the exception of a small set of genes with long branches whose relationships are poorly resolved. Several of these outliers have clear orthologs in other species. Although both clades are under purifying selection, the RR-1 variant of the R&R Consensus is evolving at twice the rate of the RR-2 variant and is structurally more labile. In contrast, the regions flanking the R&R Consensus have diversified in amino-acid composition to a much greater extent in RR-2 genes compared with RR-1 genes. Many genes are found in compact tandem arrays that may include similar or dissimilar genes but always include just one of the two classes. Tandem arrays of RR-2 genes frequently contain subsets of genes coding for highly similar proteins (sequence clusters). Properties of the proteins indicated that each cluster may serve a distinct function in the cuticle. Conclusion The complete annotation of this large gene family provides insight on the mechanisms of gene family evolution and clues about the need for so many CPR genes. These data also should assist annotation of other <it>Anopheles </it>genes.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

imitation: Clean Imitation Learning Implementations

Author: Belrose Nora
Emmons Scott
Ernestus Maximilian
Gleave Adam
Jenner Erik
Rocamonde Juan
Russell Stuart
Taufeeque Mohammad
Toyer Sam
Wang Steven H.
Publication venue
Publication date: 21/11/2022
Field of study

imitation provides open-source implementations of imitation and reward learning algorithms in PyTorch. We include three inverse reinforcement learning (IRL) algorithms, three imitation learning algorithms and a preference comparison algorithm. The implementations have been benchmarked against previous results, and automated tests cover 98% of the code. Moreover, the algorithms are implemented in a modular fashion, making it simple to develop novel algorithms in the framework. Our source code, including documentation and examples, is available at https://github.com/HumanCompatibleAI/imitatio

arXiv.org e-Print Archive